AITopics | problem accuracy

Collaborating Authors

problem accuracy

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision

Sun, Zhiqing, Yu, Longhui, Shen, Yikang, Liu, Weiyang, Yang, Yiming, Welleck, Sean, Gan, Chuang

arXiv.org Artificial IntelligenceMar-14-2024

Current AI alignment methodologies rely on human-provided demonstrations or judgments, and the learned capabilities of AI systems would be upper-bounded by human capabilities as a result. This raises a challenging research question: How can we keep improving the systems when their capabilities have surpassed the levels of humans? This paper answers this question in the context of tackling hard reasoning tasks (e.g., level 4-5 MATH problems) via learning from human annotations on easier tasks (e.g., level 1-3 MATH problems), which we term as \textit{easy-to-hard generalization}. Our key insight is that an evaluator (reward model) trained on supervisions for easier tasks can be effectively used for scoring candidate solutions of harder tasks and hence facilitating easy-to-hard generalization over different levels of tasks. Based on this insight, we propose a novel approach to scalable alignment, which firstly trains the process-supervised reward models on easy problems (e.g., level 1-3), and then uses them to evaluate the performance of policy models on hard problems. We show that such \textit{easy-to-hard generalization from evaluators} can enable \textit{easy-to-hard generalizations in generators} either through re-ranking or reinforcement learning (RL). Notably, our process-supervised 7b RL model achieves an accuracy of 34.0\% on MATH500, despite only using human supervision on easy problems. Our approach suggests a promising path toward AI systems that advance beyond the frontier of human supervision.

arxiv preprint arxiv, language model, problem accuracy, (12 more...)

arXiv.org Artificial Intelligence

2403.09472

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
(2 more...)

Genre: Research Report > New Finding (0.87)

Industry: Education (0.92)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.68)

Add feedback

ACRE: Abstract Causal REasoning Beyond Covariation

Zhang, Chi, Jia, Baoxiong, Edmonds, Mark, Zhu, Song-Chun, Zhu, Yixin

arXiv.org Artificial IntelligenceMar-25-2021

Causal induction, i.e., identifying unobservable mechanisms that lead to the observable relations among variables, has played a pivotal role in modern scientific discovery, especially in scenarios with only sparse and limited data. Humans, even young toddlers, can induce causal relationships surprisingly well in various settings despite its notorious difficulty. However, in contrast to the commonplace trait of human cognition is the lack of a diagnostic benchmark to measure causal induction for modern Artificial Intelligence (AI) systems. Therefore, in this work, we introduce the Abstract Causal REasoning (ACRE) dataset for systematic evaluation of current vision systems in causal induction. Motivated by the stream of research on causal discovery in Blicket experiments, we query a visual reasoning system with the following four types of questions in either an independent scenario or an interventional scenario: direct, indirect, screening-off, and backward-blocking, intentionally going beyond the simple strategy of inducing causal relationships by covariation. By analyzing visual reasoning architectures on this testbed, we notice that pure neural models tend towards an associative strategy under their chance-level performance, whereas neuro-symbolic combinations struggle in backward-blocking reasoning. These deficiencies call for future research in models with a more comprehensive capability of causal induction.

causal induction, query, reasoning, (13 more...)

arXiv.org Artificial Intelligence

2103.14232

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Mississippi (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.95)
Information Technology > Artificial Intelligence > Representation & Reasoning > Model-Based Reasoning (0.74)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.68)
(2 more...)

Add feedback

Time Series Classification via Topological Data Analysis

Karan, Alperen, Kaygun, Atabey

arXiv.org Machine LearningFeb-3-2021

In this study, we use persistent homology to perform classification tasks on two publicly available multivariate time series datasets [19, 11] that include physiological data collected during stressful and non stressful tasks. Instead of directly computing signal-specific features from sliding windows and subwindows on modalities such as electrocardiogram and wrist temperature (Figure 7), we extracted features using persistence diagrams and their statistical properties. Subwindowing method allowed us to reduce noise without incurring an extra computational cost. We then developed machine learning models and assess the performance of our models by varying window sizes and using different flavors of persistence diagrams. Topological Data Analysis (TDA) techniques usually work with points embedded in an affine space of large enough dimension. However, TDA techniques can still be applied to time series data sets whether they are univariate or multivariate. One can convert a univariate time series into a finite collection of points in a -dimensional affine space using delay embedding methods, of which one can compute persistent homology. Since Taken's Theorem implies that the delay embeddings produces topologically invariant subsets on a non-chaotical dynamical system [21], one can reasonably expect that persistent homology produces features that would distinguish different time series. There is a handful of research on the persistent homology of delay embeddings for time series classification [23, 20, 1].

accuracy, persistence diagram, persistent homology, (13 more...)

arXiv.org Machine Learning

2102.01956

Country: Europe > Netherlands > North Holland > Amsterdam (0.04)

Genre: Research Report > New Finding (0.89)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.48)
Health & Medicine > Diagnostic Medicine (0.48)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning (0.46)

Add feedback